Out of the 24 counties in the state of Maryland, Montgomery County is the most populous. It boasts the 20th highest household income in the country, and interestingly, the highest percentage of residents over the age of 25 who hold post-graduate degrees (Source). High household income and education indicate that Montgomery County could be a good place to live - but they aren't the only things to consider.
When evaluating a place to work/live/visit, safety is paramount, and one of the largest components of safety is crime. Montgomery County is a large geographical area, covering 507 square miles with a population of over 1 million residents, and therefore it is not sufficient to simply look at overall crime rate statistics to determine if it's a safe place to live. We asked ourselves the following questions:
In order to answer these questions, we embarked on a data analysis journey to determining whether Montgomery County is a "safe" place to live, hoping to find evidence that crime was trending downward, the police department was effective at containing and addressing crime, and if there are any demographic factors that contribute to different patterns within the county.
The libraries that we will use for the project are <....>.
First, we will need to import all required libraries and set up prerequisites.
# Import core packages
import pandas as pd
import numpy as np
import re
import datetime as dt
from matplotlib import pyplot as plt
import seaborn as sns
import requests
from matplotlib import animation
from matplotlib.animation import FuncAnimation
from IPython.display import HTML, Javascript
# Import packages for visualization
import folium
from folium import plugins
from folium.plugins import MarkerCluster
from folium.plugins import HeatMapWithTime
%matplotlib widget
The Montgomery County government has a comprehensive collection of datasets that report on county demographics, businesses, public safety, and more. From that, we were able to find a dataset that had the potential to answer our questions in the form of a csv file that we can download from their website.
This dataset is updated daily with new crime reports, and contains data that spans from 2017 to the present. It contains 282,000 rows, and has 30 columns containing detailed information about each incident - including start time, number of victims, location, and type of crime. From this plethora of data we will be able to drill down to the most important variables and indicators that we will use to answer our questions.
We downloaded the final version that is used in our analysis on November 08, 2023.
# Load Montgomery dataset
montgomery_crime_df = pd.read_csv('Montgomery_Crime.csv', index_col = ['Incident ID'], dtype={'PRA': str})
The first step of this process is to examine our dataset and understand its main characteristics, which will help us formulate some initial assumptions. This will help us identify any obvious errors, identify patterns, detect outliers, and lay the foundation for the data cleaning tasks that will need to be done.
# Inspect first 5 observations
montgomery_crime_df.head(5)
| Offence Code | CR Number | Dispatch Date / Time | Start_Date_Time | End_Date_Time | NIBRS Code | Victims | Crime Name1 | Crime Name2 | Crime Name3 | ... | Location | Council Districts | Councils | Communities | Zip Codes | Municipalities | Council Districts_from_i23j_3mj8 | Service Regions | Montgomery County Boundary | Council Districts 7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Incident ID | |||||||||||||||||||||
| 201452140 | 1205 | 230065034 | 11/08/2023 03:20:12 AM | 11/08/2023 03:20:00 AM | NaN | 120 | 1 | Crime Against Property | Robbery | ROBBERY - STREET-OTHER WEAPON | ... | (38.9918, -77.0241) | 5.0 | 5.0 | 7.0 | 3065.0 | 1.0 | 5.0 | 1.0 | 1.0 | 7.0 |
| 201452138 | 2203 | 230065033 | 11/08/2023 02:43:30 AM | 11/08/2023 02:30:00 AM | 11/08/2023 02:42:00 AM | 220 | 1 | Crime Against Property | Burglary/Breaking and Entering | BURGLARY - FORCED ENTRY-NONRESIDENTIAL | ... | (39.055, -77.1183) | 3.0 | 3.0 | 19.0 | 60.0 | 1.0 | 3.0 | 3.0 | 1.0 | 7.0 |
| 201452136 | 1305 | 230065028 | 11/08/2023 01:08:41 AM | 11/08/2023 12:08:00 AM | 11/08/2023 01:50:00 AM | 13A | 1 | Crime Against Person | Aggravated Assault | ASSAULT - AGGRAVATED - NON-FAMILY-OTHER WEAPON | ... | (39.0777, -77.14) | 3.0 | 3.0 | 25.0 | 60.0 | 12.0 | 3.0 | 3.0 | 1.0 | 4.0 |
| 201452119 | 2601 | 230065015 | 11/07/2023 09:15:52 PM | 11/07/2023 09:15:00 PM | NaN | 26A | 1 | Crime Against Property | False Pretenses/Swindle/Confidence Game | FRAUD - CONFIDENCE GAME | ... | (39.0834, -77.0652) | 4.0 | 4.0 | 59.0 | 6.0 | 1.0 | 4.0 | 4.0 | 1.0 | 2.0 |
| 201452132 | 1304 | 230065019 | 11/07/2023 09:06:46 PM | 11/07/2023 09:06:00 PM | 11/08/2023 12:00:00 AM | 13A | 5 | Crime Against Person | Aggravated Assault | ASSAULT - AGGRAVATED - NON-FAMILY-GUN | ... | (39.1418, -77.2244) | 3.0 | 3.0 | 36.0 | 33.0 | 21.0 | 3.0 | 5.0 | 1.0 | 4.0 |
5 rows × 38 columns
# Calculate descriptive statistics about numerical variables in dataset
montgomery_crime_df.describe()
| Offence Code | CR Number | Victims | Zip Code | Address Number | Latitude | Longitude | Council Districts | Councils | Communities | Zip Codes | Municipalities | Council Districts_from_i23j_3mj8 | Service Regions | Montgomery County Boundary | Council Districts 7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 279992.000000 | 2.799920e+05 | 279992.000000 | 277598.000000 | 2.533180e+05 | 279992.000000 | 279992.000000 | 276220.000000 | 276220.000000 | 273817.000000 | 277334.000000 | 275385.000000 | 276220.000000 | 274351.000000 | 274100.0 | 276237.000000 |
| mean | 2802.817538 | 1.785562e+08 | 1.028054 | 20877.247509 | 8.530961e+03 | 38.709476 | -76.380807 | 3.246278 | 3.246278 | 27.601161 | 421.351381 | 4.186887 | 3.246278 | 3.437713 | 1.0 | 4.069082 |
| std | 1193.342610 | 6.111787e+07 | 0.209805 | 163.822120 | 1.799520e+04 | 3.790282 | 7.478188 | 1.387858 | 1.387858 | 16.085272 | 995.212405 | 6.534121 | 1.387858 | 1.413078 | 0.0 | 2.018775 |
| min | 901.000000 | 1.004668e+07 | 1.000000 | 6.000000 | 0.000000e+00 | 0.000000 | -77.516700 | 1.000000 | 1.000000 | 1.000000 | 6.000000 | 1.000000 | 1.000000 | 1.000000 | 1.0 | 1.000000 |
| 25% | 2303.000000 | 1.800087e+08 | 1.000000 | 20854.000000 | 1.700000e+03 | 39.021810 | -77.198300 | 2.000000 | 2.000000 | 16.000000 | 21.000000 | 1.000000 | 2.000000 | 2.000000 | 1.0 | 2.000000 |
| 50% | 2399.000000 | 1.900553e+08 | 1.000000 | 20878.000000 | 8.200000e+03 | 39.072190 | -77.098400 | 3.000000 | 3.000000 | 25.000000 | 39.000000 | 1.000000 | 3.000000 | 4.000000 | 1.0 | 4.000000 |
| 75% | 2999.000000 | 2.100518e+08 | 1.000000 | 20904.000000 | 1.240000e+04 | 39.142190 | -77.029700 | 5.000000 | 5.000000 | 39.000000 | 71.000000 | 1.000000 | 5.000000 | 5.000000 | 1.0 | 6.000000 |
| max | 9103.000000 | 2.301015e+08 | 22.000000 | 29878.000000 | 2.090600e+06 | 39.342600 | 0.000000 | 5.000000 | 5.000000 | 62.000000 | 3065.000000 | 24.000000 | 5.000000 | 5.000000 | 1.0 | 7.000000 |
# Display shape of dataset
df_shape = montgomery_crime_df.shape
# Display column types
df_column_types = montgomery_crime_df.dtypes
# Identify columns with missing values
missing_columns = montgomery_crime_df.columns[montgomery_crime_df.isnull().any()]
print("Columns with missing values:")
print(montgomery_crime_df[missing_columns].isnull().sum())
Columns with missing values: Dispatch Date / Time 40628 End_Date_Time 139934 Police District Name 476 Block Address 26773 City 13 State 5226 Zip Code 2394 PRA 6 Address Number 26674 Street Prefix 267418 Street Name 477 Street Suffix 275349 Street Type 785 Council Districts 3772 Councils 3772 Communities 6175 Zip Codes 2658 Municipalities 4607 Council Districts_from_i23j_3mj8 3772 Service Regions 5641 Montgomery County Boundary 5892 Council Districts 7 3755 dtype: int64
We want to analyze the different types of crimes that are committed, and potentially do further categorization, so we first need to determine the existing categorization provided by this dataset.
# Display value counts for crime types to gather initial information about segments
montgomery_crime_df['Crime Name1'].value_counts()
Crime Name1 Crime Against Property 175277 Crime Against Society 67030 Crime Against Person 37685 Name: count, dtype: int64
# Display value counts for sub-category of crime type
montgomery_crime_df['Crime Name2'].value_counts()
Crime Name2 Theft From Motor Vehicle 31259 Simple Assault 26114 Drug/Narcotic Violations 25234 Destruction/Damage/Vandalism of Property 23503 Shoplifting 21444 All other Larceny 17028 Driving Under the Influence 16797 Theft from Building 13527 Burglary/Breaking and Entering 10670 Identity Theft 10357 Motor Vehicle Theft 10178 Theft of Motor Vehicle Parts or Accessories 9251 False Pretenses/Swindle/Confidence Game 8143 Liquor Law Violations 6999 Disorderly Conduct 6367 Credit Card/Automatic Teller Machine Fraud 6137 Aggravated Assault 5428 Trespass of Real Property 4467 Robbery 4344 Counterfeiting/Forgery 3756 Weapon Law Violations 2932 Drug Equipment Violations 2314 Forcible Fondling 2158 Forcible Rape 1709 Impersonation 1665 Family Offenses, NonViolent 923 Pocket/picking 878 Forcible Sodomy 730 Embezzlement 562 Extortion/Blackmail 548 Purse-snatching 508 Sexual Assault With An Object 506 Intimidation 495 Arson 440 Pornography/Obscene Material 385 Bad Checks 362 Wire Fraud 337 Stolen Property Offenses 263 Prostitution 206 Murder and Nonnegligent Manslaughter 153 Peeping Tom 134 Drunkenness 117 Statuory Rape 108 Purchasing Prostitution 97 Kidnapping/Abduction 96 Animal Cruelty 94 From Coin/Operated Machine or Device 72 Human Trafficking, Commercial Sex Acts 66 Welfare Fraud 43 Curfew/Loitering/Vagrancy Violations 38 Assisting or Promoting Prostitution 22 Justifiable Homicide 14 Incest 7 Bribery 2 Negligent Manslaughter 2 Human Trafficking, Involuntary Servitude 2 Operating/Promoting/Assisting Gambling 1 Name: count, dtype: int64
# Display value counts for final sub-category of crime type
montgomery_crime_df['Crime Name3'].value_counts()
Crime Name3
LARCENY - FROM AUTO 31259
ASSAULT - 2ND DEGREE 21998
LARCENY - SHOPLIFTING 21444
DRUGS - MARIJUANA - POSSESS 16486
DRIVING UNDER THE INFLUENCE LIQUOR 15967
...
BRIBERY - OFFERING BRIBE 1
HOMICIDE - JUSTIFIABLE 1
HOMICIDE - JOHN OR JANE DOE - NO WARRANT 1
HOMOSEXUAL ACT WITH MAN 1
COMM SEX OFF - TRANSPORT FEMALE INTERSTATE FOR IMM 1
Name: count, Length: 268, dtype: int64
We have several columns with thousands of observations that are null. There is also the potential that our data may be corrupted, improperly formatted, duplicated, or incomplete. In order to rectify these issues, we will need to clean this dataset prior to performing any analyses, as inaccurate data can lead to unreliable algorithms/results.
Before we proceed with handling the null data, we should determine if there are any duplicates. We will do this by looking specifically at the Incident ID - duplicates may have resulted from errors in data entry, data extraction, or other data collection processes.
# Identify duplicate records based on the Incident ID
duplicate_records = montgomery_crime_df.index.duplicated()
#montgomery_crime_df[duplicate_records]
# Display the number of duplicate records
print('Number of duplicate records: ', montgomery_crime_df.index.duplicated().sum())
Number of duplicate records: 25371
# Remove duplicated records, keeping the first instance because df is sorted in descending order by time
montgomery_crime_df = montgomery_crime_df[~montgomery_crime_df.index.duplicated(keep='first')]
# montgomery_crime_df
We will remove several columns from our dataset for the following reasons.
Police District Number: The information about the police district is already captured in the 'Police District Name' column. Including both columns is redundant, and the 'Police District Name' provides sufficient information for analysis.
Zip Codes: The 'Zip Code' column already contains information about the location. Including both 'Zip Code' and 'Zip Codes' is redundant, and 'Zip Code' is more relevant for location-based analysis.
Municipalities: The dataset already includes the 'City' column, which represents the municipality or city associated with each incident. Including 'Municipalities' is redundant and does not provide additional meaningful information.
Council Districts_from_i23j_3mj8: The purpose and origin of this column are not explicitly defined, and its inclusion does not contribute to the analysis. It lacks clear relevance or significance for the crime-related insights sought from the dataset.
Service Regions: The role and significance of 'Service Regions' in the context of crime analysis are not apparent. Without a clear understanding of its relevance, this column is excluded from the dataset.
Council Districts 7: The dataset already contains 'Council Districts,' and the additional 'Council Districts 7' column lacks justification or documentation for its inclusion. It is prudent to rely on the more standard 'Council Districts' column for analysis.
# Drop unnecessary columns
montgomery_crime_df.drop(['Police District Number','Zip Codes', 'Municipalities', 'Council Districts_from_i23j_3mj8', 'Service Regions', 'Council Districts 7'], axis=1, inplace=True)
# montgomery_crime_df
Add markdown....
There are multiple cities that are mispelled in our dataset. Because we plan to use this column in our analysis, we need to figure out how to fix the incorrect values.
# Replace the NaN values with an empty string for consistency
montgomery_crime_df['City'] = montgomery_crime_df['City'].fillna('')
# Define a mapping for corrections
city_corrections = {
'SILVER SPRING': ['SILVER SPRIN G', 'SILVERS SPRING', 'SILER SPRING', 'SILVE SPRING', 'SILVERSPRING', 'SILVER APRING', 'SILVER SPRIG', 'SILVER SPRING`', 'SLVER SPRING', 'SLIVER SPRING', 'SILV ER SPRING', 'SILVE4R SPRING', 'SILVER SPRING`', 'SILVER SPRNIG', 'SILVER SPRIING', 'SILVER SPRIND', 'SILVER SPRIGN'],
'ROCKVILLE': ['ROCKVIILE', 'ROCKVIL', 'ROCKVIILE', 'ROCKVILEE', 'ROCKVILLE,', 'ROCKIVLLE', 'ROCKVILLE,', 'ROCKVIILLE', 'ROCVILLE', 'ROCVKILLE', 'ROCKVIILE'],
'GAITHERSBURG': ['GAITHERBURG', 'GAITHERBURG', 'GAITHERSBUG', 'GAITHERSBUIRG', 'GAITHERSBURT', 'GAITHERSBRUG', 'GAITHERSGURG', 'GAITHESRBURG', 'GAITERSBURG', 'GAITHERSRBURG', 'GAIHTERSBURG', 'GAUTHERSBURG', 'GAIHTERSBURG', 'GAIHERSBURG', 'GATIHERSBURG', 'GAITHERBSURG', 'GAITHERSBRG', 'GAITHERSBRG', 'GAITHERSURG', 'GAITHRESBURG', 'GAISTHERSBURG', 'GAITHERSURG', 'GITHERSBURG', 'GITHERSBYRG'],
'BETHESDA': ['BEHTESDA', 'BEHESDA', 'BETHEDA', 'BETHESDAS', 'BETHESDAS', 'BETHSDA', 'BETESDA', 'BETHESDA'],
'TAKOMA PARK': ['TACOMA PARK', 'TAKOMA PARK'],
# Add more entries as needed
}
city_corrections_manual = {
'BETHESDA' : ['BEHTESDA', 'BEHESDA', 'BETESDA', 'BETHEDA', 'BETHESA', 'BETHESDAS', 'BETHSDA'],
'Beallsville': ['BEALSVILLE'],
'BARNESVILLE' : ['BARNESVIILE','BARNESVIILE','BARNSVILLE'],
'BROOKEVILLE' : 'BROOKVILLE',
'BURTONSVILLE' : ['BURTOSNVILLE','BURTSONVILLE', 'BUTINSVILLE'],
'CHEVY CHASE': ['CEHVY CHASE', 'CHVEY CHASE', 'CHEVY CHASE #3', 'CHEVY CHASE #4', 'CHEVY CHASE VIEW', 'CHEVY CHASE VILLAGE'],
'CLARKSBURG': ['CALARKSBURG', 'CLAEKSBURG','CLARKESBURG','CLARKSURG', 'CLARSBURG','CLARSKBURG'],
'COMUS': 'COMUS',
'DAMASCUS': 'DANASCUS',
'DERWOOD' : 'DEERWOOD',
'FREDERICK': 'FREDERICK',
'FRIENDSHIP HEIGHTS': 'FRIENDHSIP HEIGHTS',
'GAITHERSBURG' : ['7','GATIHERSBURG', 'GAIHTERSBURG', 'GAITHERBURG', 'GAITHERESBURG','GAITEHRSBURG', 'GITHERSBURG', 'GAIHERSBURG', 'GAISTHERSBURG', 'GAITERSBURG', 'GAITHERBSURG', 'GAITHERSBRG', 'GAITHERSBRUG', 'GAITHERSBUG', 'GAITHERSBUIRG', 'GAITHERSBURT', 'GAITHERSBYRG', 'GAITHERSGURG', 'GAITHERSRBURG', 'GAITHERSSBURG', 'GAITHERSURG', 'GAITHESBURG', 'GAITHESRBURG', 'GAITHRERSBURG', 'GAITHRESBURG', 'GIATHERSBURG', 'GATHERSBURG', 'GAUTHERSBURG'],
'GARRETT PARK' : 'GARRETT PARK',
'GERMANTOWN' : ['4','GRMANTOWN', 'GEMANTOWN', 'GERMATOWN', 'GERMANTNOWN', 'GERMANTOWM', 'GERMANTOOWN', 'GERMANTOEN', 'GERMANTONW', 'GERMANTIWN', 'GEERMANTOWN', 'GERAMNTOWN', 'GERANTOWN', 'GERMAN4TOWN', 'GERMANTOW', 'GERMANTOWNMD','GERMANTWN', 'GERMANTWON', 'GERMNATOWN', 'GERMNTOWN', 'GERRMANTOWN'],
'GLEN ECHO': 'GLEN ECHO`',
'HYATTSTOWN': ['HYATTTOWN','1'],
'KENSINGTON': ['KENSIGNTON', 'KENSINGTOWN', 'KENSONGTON', 'KENSTINGTON', 'KENNSINGTON', 'KENSINGTNO'],
'LAYTONSVILLE': 'LATONSVILLE',
'MONTGOMERY VILLAGE': ['20877','6','MCG','MCGGAITHERSBURG', 'MONT VILLAGE', 'MONTGGOMERY VILLAGE', 'MONTGOMERY VILAGE','MONTGOMERY VILLAE', 'MONTGOMERY VILLLAGE', 'MONTGOMRY VILLAGE', 'MONTOMGERY VILLAGE','MOTGOMERY VILLAGE', 'MOMTGOMERY VILLAGE', 'MONGTOMERY VILLAGE'],
'MOUNT AIRY': ['MT AIRY', 'MT. AIRY'],
'NORTH BETHESDA': ['N BETHESDAQ', 'N. BETHESDA', 'N. POTOMAC', 'NORTH BEHTESDA', 'NORTH BETHSDA','NORTH POTOAMC', 'NOTRTH POTOMAC', 'N BETHESDA'],
'NORTH POTOMAC' : 'N POTOMAC',
'OLNEY': ['ONLEY','ONEY'],
'POOLESVILLE': ['POOLSVILLE','PO'],
'POTOMAC': 'POTIMAC',
'ROCKVILLE': ['3','ROCKIVLLE', 'ROCKVILLLE', 'ROCKVIILLE', 'ROCVILLE', 'ROCVKILLE', 'RCKVILLE', 'ROCKILLE', 'ROCKIVILLE', 'ROCKVIILE', 'ROCKVILEE', 'ROCKVILL', 'ROCKVLLE', 'ROKVILLE', 'ROOCKVILLE', "ROCKVILLE'", 'ROCKVILLE,','ROCKVILE'],
'SANDY SPRING': 'SANDY SPPRING',
'SILVER SPRING': ['2','SILVER SPRING','SLIVER SPRING','SILVE4R SPRING', 'SILVER', 'SILVER APRING', 'SILVER SPING', 'SILVER SPRIING', 'SILVER SPRIND', 'SILVER SPRING', 'SILVER SPRINGQ', 'SILVER SPRIGN','SILVER SRING', 'SILVER SRING', 'SIVLER SPRING', 'SLVER SPRING', 'SIVER SPRING', 'SILER SPRING', 'SILV ER SPRING', 'SILVE SPRING', 'SILVER SPRING', 'SILVER SPIRNG','SILVER SPRIG', 'SILVER SPRIN', 'SILVER SPRIN G', 'SILVER SPRING`', 'SILVER SPRNG','SILVER SPRNIG', 'SILVER SPSRING', 'SILVER SRPING', 'SILVERS SPRING', 'SILVERSPRING','SILVR SPRING'],
'SPENCERVILLE': 'APENCERVILLE',
'TAKOMA PARK': ['TAKOMS PARK','TACOMA PARK', 'TAKOMA','TP'],
'VALLEYWOOD': 'VALLEYWOOD',
'WHEATON': ['WEHATON','WEATON']
}
# not sure we need to display this
# city_corrections_manual.keys()
# Function to correct misspelled city names
def correct_city_name(city):
for correct_name, misspelled_names in city_corrections_manual.items():
if city in misspelled_names:
return correct_name
return city # Return the original city name if not found in the misspelled names
# Apply the correction function to the 'City' column and display results
montgomery_crime_df['City'] = montgomery_crime_df['City'].apply(correct_city_name)
#print(len(montgomery_crime_df['City'].unique()))
#print(sorted(montgomery_crime_df['City'].unique()))
There are some corrupted values, specifically in the Dispatch Date/Time, Start_Date_Time, and Agency columns, that are not feasible to impute. Instead, we will remove them from the dataset.
# Filter out rows where either dispatch date/time or start date time are missing
montgomery_crime_df = montgomery_crime_df[montgomery_crime_df['Dispatch Date / Time'].notnull() & montgomery_crime_df['Start_Date_Time'].notnull()]
# Filter out rows where dispatch date/time is prior to start date time
montgomery_crime_df = montgomery_crime_df[montgomery_crime_df['Dispatch Date / Time'] >= montgomery_crime_df['Start_Date_Time']]
# Drop rows where Agency = P, since this is a meaningless value
montgomery_crime_df.drop(montgomery_crime_df[montgomery_crime_df['Agency'] == 'P'].index, inplace=True)
montgomery_crime_df.drop(montgomery_crime_df[montgomery_crime_df['Agency'] == 'OTHR'].index, inplace=True)
Some of our columns are not the desired types. Lets change that for all 3 columns that represent a timestamp of a crime observation - start, dispatch, and end.
# Convert 'Dispatch Date / Time' to datetime format
montgomery_crime_df['Dispatch Date / Time'] = pd.to_datetime(montgomery_crime_df['Dispatch Date / Time'])
# Convert 'Start_Date_Time' to datetime format
montgomery_crime_df['Start_Date_Time'] = pd.to_datetime(montgomery_crime_df['Start_Date_Time'])
# Convert 'End_Date_Time' to datetime format
montgomery_crime_df['End_Date_Time'] = pd.to_datetime(montgomery_crime_df['End_Date_Time'])
Lastly, our data covers crimes ranging from 2016 to November 8th, 2023. We decided to filter out any crime data that occurred prior to 2016...
# filter out any observations that occurred prior to 2017
montgomery_crime_df = montgomery_crime_df[montgomery_crime_df['Start_Date_Time'] > '2017-01-01']
| Offence Code | CR Number | Dispatch Date / Time | Start_Date_Time | End_Date_Time | NIBRS Code | Victims | Crime Name1 | Crime Name2 | Crime Name3 | ... | Street Name | Street Suffix | Street Type | Latitude | Longitude | Location | Council Districts | Councils | Communities | Montgomery County Boundary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Incident ID | |||||||||||||||||||||
| 201452140 | 1205 | 230065034 | 2023-11-08 03:20:12 | 2023-11-08 03:20:00 | NaT | 120 | 1 | Crime Against Property | Robbery | ROBBERY - STREET-OTHER WEAPON | ... | FENTON | NaN | ST | 38.99175 | -77.0241 | (38.9918, -77.0241) | 5.0 | 5.0 | 7.0 | 1.0 |
| 201452138 | 2203 | 230065033 | 2023-11-08 02:43:30 | 2023-11-08 02:30:00 | 2023-11-08 02:42:00 | 220 | 1 | Crime Against Property | Burglary/Breaking and Entering | BURGLARY - FORCED ENTRY-NONRESIDENTIAL | ... | ROCKVILLE | NaN | PIK | 39.05503 | -77.1183 | (39.055, -77.1183) | 3.0 | 3.0 | 19.0 | 1.0 |
| 201452119 | 2601 | 230065015 | 2023-11-07 21:15:52 | 2023-11-07 21:15:00 | NaT | 26A | 1 | Crime Against Property | False Pretenses/Swindle/Confidence Game | FRAUD - CONFIDENCE GAME | ... | BIRCHTREE | NaN | LA | 39.08343 | -77.0652 | (39.0834, -77.0652) | 4.0 | 4.0 | 59.0 | 1.0 |
| 201452132 | 1304 | 230065019 | 2023-11-07 21:06:46 | 2023-11-07 21:06:00 | 2023-11-08 00:00:00 | 13A | 5 | Crime Against Person | Aggravated Assault | ASSAULT - AGGRAVATED - NON-FAMILY-GUN | ... | QUINCE ORCHARD | NaN | BLV | 39.14175 | -77.2244 | (39.1418, -77.2244) | 3.0 | 3.0 | 36.0 | 1.0 |
| 201452133 | 3615 | 230065007 | 2023-11-07 20:54:34 | 2023-11-07 20:54:00 | NaT | 90C | 1 | Crime Against Society | Disorderly Conduct | SEX OFFENSE - INDECENT EXPOSURE TO ADULT | ... | FREDERICK | NaN | AVE | 39.14886 | -77.2077 | (39.1489, -77.2077) | 3.0 | 3.0 | 36.0 | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 201368726 | 2502 | 220011575 | 2022-03-17 18:44:08 | 2017-01-01 07:00:00 | 2022-03-17 16:49:00 | 250 | 1 | Crime Against Property | Counterfeiting/Forgery | FORGERY OF OTHER | ... | MONTROSE | NaN | AVE | 39.02585 | -77.1008 | (39.0258, -77.1008) | 1.0 | 1.0 | 12.0 | 1.0 |
| 201181181 | 2610 | 180015001 | 2018-03-27 15:04:07 | 2017-01-01 06:00:00 | 2017-12-31 00:00:00 | 26F | 1 | Crime Against Property | Identity Theft | FRAUD - IDENTITY THEFT | ... | BRASSIE | NaN | PL | 39.17418 | -77.2022 | (39.1742, -77.2022) | 2.0 | 2.0 | 38.0 | 1.0 |
| 201133509 | 2604 | 170506789 | 2017-05-09 10:46:29 | 2017-01-01 00:01:00 | 2017-01-01 23:59:00 | 26C | 1 | Crime Against Property | Impersonation | FRAUD - IMPERSONATION | ... | CHESHIRE | NaN | DR | 39.02616 | -77.1119 | (39.0262, -77.1119) | 1.0 | 1.0 | 19.0 | 1.0 |
| 201136683 | 2604 | 170510782 | 2017-05-30 21:51:31 | 2017-01-01 00:01:00 | 2017-05-30 21:00:00 | 26C | 1 | Crime Against Property | Impersonation | FRAUD - IMPERSONATION | ... | MAIN | NaN | ST | 39.28768 | -77.2017 | (39.2877, -77.2017) | 2.0 | 2.0 | 51.0 | 1.0 |
| 201163464 | 2604 | 170544271 | 2017-11-22 10:03:32 | 2017-01-01 00:01:00 | 2017-09-30 23:59:00 | 26C | 1 | Crime Against Property | Impersonation | FRAUD - IMPERSONATION | ... | GREY COLT | NaN | DR | 39.08376 | -77.2363 | (39.0838, -77.2363) | 2.0 | 2.0 | 24.0 | 1.0 |
186980 rows × 32 columns
Now that we've done that, we want to perform some grouping operations that will be used later on in our analysis. The Start Time, which we have defined as when a crime is reported by either a citizen or the police, is what we will focus on as it is the best representation of "when" a crime occurred.
# Extract month and year information from 'Start_Date_Time'
montgomery_crime_df['Month'] = montgomery_crime_df['Start_Date_Time'].dt.month
montgomery_crime_df['Year'] = montgomery_crime_df['Start_Date_Time'].dt.year
# Mapping numerical months to corresponding names for readability
month_names = {1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: 'May', 6: 'June', 7: 'July',
8: 'August', 9: 'September', 10: 'October', 11: 'November', 12: 'December'}
# Define the order of months for categorical ordering
month_order = [month for month in month_names.values()]
# Create a new column 'Month_Name' and map the month names
montgomery_crime_df['Month_Name'] = montgomery_crime_df['Month'].map(month_names)
# Set 'Month_Name' as a categorical variable w|ith the specified order
montgomery_crime_df['Month_Name'] = pd.Categorical(montgomery_crime_df['Month_Name'], categories=month_order, ordered=True)
# Group by 'Year' and 'Month_Name', and count the occurrences of crimes
incident_groups = montgomery_crime_df.groupby(['Year', 'Month_Name'], observed=True)['Crime Name1'].count()
A different lens through which we can look at this data about when crimes occured is the time of day, or the hour at which they occur. To do this, we will do some data transformation.
# NOT SURE THIS IS NECESSARY
# Extract the hour information from 'Start_Date_Time'
montgomery_crime_df['Hour'] = montgomery_crime_df['Start_Date_Time'].dt.hour
# Group by hour and count the occurrences of crimes
crime_by_hour = montgomery_crime_df.groupby('Hour').size()
The first thing we will look at is the overall crime rate in Montgomery, and how it has changed over time. In order to do this, we will need to transform our data to have it represent time series data that we can visualize. We decided to analyze it by total crime volume each month.
To answer our questions about crime trends in Montgomery County, we should also analyze the time series data for different categories of crime. We will do this by creating a time series line graph that plots the total number of crimes taking place on a month by month basis. We will also take a cross section of the data for each of the 3 sub-categories of crime - Crime Against Person, Crime Against Property, and Crime Against Society.
Crime Against Person is defined as , and includes assault, homicide, and robbery. Crime Against Property is defined as , and includes breaking & entering, car theft, and . Crime Against Society is defined as, and includes .
# Create dummy variables for crime category
crime_name_dummies = pd.get_dummies(montgomery_crime_df['Crime Name1'], prefix='dummy', prefix_sep='_', dummy_na=False, columns=['Crime Name1'], dtype='int')
# merge dummy variables with main dataframe
montgomery_crime_df = montgomery_crime_df.merge(crime_name_dummies, on='Incident ID');
montgomery_crime_df
| Offence Code | CR Number | Dispatch Date / Time | Start_Date_Time | End_Date_Time | NIBRS Code | Victims | Crime Name1 | Crime Name2 | Crime Name3 | ... | Councils | Communities | Montgomery County Boundary | Month | Year | Month_Name | Hour | dummy_Crime Against Person | dummy_Crime Against Property | dummy_Crime Against Society | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Incident ID | |||||||||||||||||||||
| 201452140 | 1205 | 230065034 | 2023-11-08 03:20:12 | 2023-11-08 03:20:00 | NaT | 120 | 1 | Crime Against Property | Robbery | ROBBERY - STREET-OTHER WEAPON | ... | 5.0 | 7.0 | 1.0 | 11 | 2023 | November | 3 | 0 | 1 | 0 |
| 201452138 | 2203 | 230065033 | 2023-11-08 02:43:30 | 2023-11-08 02:30:00 | 2023-11-08 02:42:00 | 220 | 1 | Crime Against Property | Burglary/Breaking and Entering | BURGLARY - FORCED ENTRY-NONRESIDENTIAL | ... | 3.0 | 19.0 | 1.0 | 11 | 2023 | November | 2 | 0 | 1 | 0 |
| 201452119 | 2601 | 230065015 | 2023-11-07 21:15:52 | 2023-11-07 21:15:00 | NaT | 26A | 1 | Crime Against Property | False Pretenses/Swindle/Confidence Game | FRAUD - CONFIDENCE GAME | ... | 4.0 | 59.0 | 1.0 | 11 | 2023 | November | 21 | 0 | 1 | 0 |
| 201452132 | 1304 | 230065019 | 2023-11-07 21:06:46 | 2023-11-07 21:06:00 | 2023-11-08 00:00:00 | 13A | 5 | Crime Against Person | Aggravated Assault | ASSAULT - AGGRAVATED - NON-FAMILY-GUN | ... | 3.0 | 36.0 | 1.0 | 11 | 2023 | November | 21 | 1 | 0 | 0 |
| 201452133 | 3615 | 230065007 | 2023-11-07 20:54:34 | 2023-11-07 20:54:00 | NaT | 90C | 1 | Crime Against Society | Disorderly Conduct | SEX OFFENSE - INDECENT EXPOSURE TO ADULT | ... | 3.0 | 36.0 | 1.0 | 11 | 2023 | November | 20 | 0 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 201368726 | 2502 | 220011575 | 2022-03-17 18:44:08 | 2017-01-01 07:00:00 | 2022-03-17 16:49:00 | 250 | 1 | Crime Against Property | Counterfeiting/Forgery | FORGERY OF OTHER | ... | 1.0 | 12.0 | 1.0 | 1 | 2017 | January | 7 | 0 | 1 | 0 |
| 201181181 | 2610 | 180015001 | 2018-03-27 15:04:07 | 2017-01-01 06:00:00 | 2017-12-31 00:00:00 | 26F | 1 | Crime Against Property | Identity Theft | FRAUD - IDENTITY THEFT | ... | 2.0 | 38.0 | 1.0 | 1 | 2017 | January | 6 | 0 | 1 | 0 |
| 201133509 | 2604 | 170506789 | 2017-05-09 10:46:29 | 2017-01-01 00:01:00 | 2017-01-01 23:59:00 | 26C | 1 | Crime Against Property | Impersonation | FRAUD - IMPERSONATION | ... | 1.0 | 19.0 | 1.0 | 1 | 2017 | January | 0 | 0 | 1 | 0 |
| 201136683 | 2604 | 170510782 | 2017-05-30 21:51:31 | 2017-01-01 00:01:00 | 2017-05-30 21:00:00 | 26C | 1 | Crime Against Property | Impersonation | FRAUD - IMPERSONATION | ... | 2.0 | 51.0 | 1.0 | 1 | 2017 | January | 0 | 0 | 1 | 0 |
| 201163464 | 2604 | 170544271 | 2017-11-22 10:03:32 | 2017-01-01 00:01:00 | 2017-09-30 23:59:00 | 26C | 1 | Crime Against Property | Impersonation | FRAUD - IMPERSONATION | ... | 2.0 | 24.0 | 1.0 | 1 | 2017 | January | 0 | 0 | 1 | 0 |
186980 rows × 39 columns
# Create new column that does not contain time values
montgomery_crime_df['Day'] = montgomery_crime_df['Start_Date_Time'].dt.date
# Count the number of crimes in each month
group_by_date = montgomery_crime_df.groupby('Day', observed=True)['Crime Name1'].count()
group_by_date.index = pd.to_datetime(group_by_date.index)
group_by_month = group_by_date.resample('1M').sum()
# Sum the number of crimes against person for each month in dataset
person_group_by_date = montgomery_crime_df.groupby('Day')['dummy_Crime Against Person'].sum()
# Convert to datetime
person_group_by_date.index = pd.to_datetime(group_by_date.index)
# Resample dataset to aggregate counts based on month
person_group_by_month = person_group_by_date.resample('1M').sum()
# Drop final month since it is incomplete
person_group_by_month.drop(person_group_by_month.tail(1).index,inplace=True)
# Sum the number of crimes against person for each month in dataset
property_group_by_date = montgomery_crime_df.groupby('Day')['dummy_Crime Against Property'].sum()
# Convert to datetime
property_group_by_date.index = pd.to_datetime(group_by_date.index)
# Resample dataset to aggregate counts based on month
property_group_by_month = property_group_by_date.resample('1M').sum()
# Drop final month since it is incomplete
property_group_by_month.drop(property_group_by_month.tail(1).index,inplace=True)
# Sum the number of crimes against person for each month in dataset
society_group_by_date = montgomery_crime_df.groupby('Day')['dummy_Crime Against Society'].sum()
# Convert to datetime
society_group_by_date.index = pd.to_datetime(group_by_date.index)
# Resample dataset to aggregate counts based on month
society_group_by_month = society_group_by_date.resample('1M').sum()
# Drop final month since it is incomplete
society_group_by_month.drop(society_group_by_month.tail(1).index,inplace=True)
# Drop the final value in the total crime rate series
group_by_month.drop(group_by_month.tail(1).index,inplace=True)
# Initialize values for use in our animated plot
crime_rate = group_by_month.values
t = group_by_month.index
x,y = [], []
person_crime_rate = person_group_by_month.values
y1 = []
property_crime_rate = property_group_by_month.values
y2 = []
society_crime_rate = society_group_by_month.values
y3 = []
%%capture
# create matplotlib figure and axes
fig = plt.figure(figsize=(14,8));
axes = fig.add_subplot(1,1,1);
# set initial values and configuration
axes.set_ylim(0, 3000)
axes.set_title('Montgomery County Crime Rate', fontsize=20)
axes.set_xlabel('Month/Year')
axes.set_ylabel('Number of Crimes')
crime_rate_plot = plt.plot([], [])
# create legend for plot
all_crime = plt.plot(x,y, color='black', marker='*', scaley=True, scalex=True, label='All Crime')
person_crime = plt.plot(x,y1, 'r-.', scaley=True, scalex=True, label='Crime Against Person')
property_crime = plt.plot(x,y2, 'b--', scaley=True, scalex=True, label='Crime Against Property')
society_crime = plt.plot(x,y3, 'g:', scaley=True, scalex=True, label='Crime Against Society')
legend = plt.legend(loc=1)
# Define function that will animate the plot
def animate(i):
# append new values to the arrays that will be plotted
x.append(t[i])
y.append(crime_rate[i])
y1.append(person_crime_rate[i])
y2.append(property_crime_rate[i])
y3.append(society_crime_rate[i])
# plot crime rates
all_crime = plt.plot(x,y, color='black', marker='*', scaley=True, scalex=True)
person_crime = plt.plot(x,y1, 'r-.', scaley=True, scalex=True)
property_crime = plt.plot(x,y2, 'b--', scaley=True, scalex=True)
society_crime = plt.plot(x,y3, 'g:', scaley=True, scalex=True)
# ADD ARROWS AND ANIMATIONS
if i > 40:
# plot marker for first US case of covid 19
plt.text(dt.datetime.strptime('03/10/2019', '%m/%d/%Y'), 900, 'First Covid-19\ncase reported\nin U.S.', color='grey', fontsize='large', alpha=0.1)
plt.plot([dt.datetime.strptime('01/20/2020', '%m/%d/%Y')] * 100, range(0, 4000, 40), '--', linewidth=1, alpha=0.01)
# plot marker for official start of covid pandemic
plt.text(dt.datetime.strptime('03/20/2020', '%m/%d/%Y'), 800, 'WHO declares\nCovid-19 Pandemic', color='grey', fontsize='large', alpha=0.1)
plt.plot([dt.datetime.strptime('03/11/2020', '%m/%d/%Y')] * 100, range(0, 4000, 40), '--', linewidth=1, alpha=0.01)
# create animation using animate function
anim = FuncAnimation(fig=fig, func=animate, frames=len(group_by_month), interval=400, repeat=False)
# show plot and enable viewing in html
display(HTML(anim.to_jshtml()))
# Automatically play the animation once it is generated in HTML
Javascript('document.querySelector(".anim-buttons > button:nth-child(6)").click()')